Quick exploration of time projected graph visualization and path distances calculations

This worksheet has a draft implementation of the time projected network, and does some distance based calculations with it in order to cross-check the assumptions of the earliest-temporal path algorithm. It draws heavily on Jim Moody’s draft paper Static Representations of Dynamic Networks

library(tsna)
## Loading required package: network
## network: Classes for Relational Data
## Version 1.13.0 created on 2015-08-31.
## copyright (c) 2005, Carter T. Butts, University of California-Irvine
##                     Mark S. Handcock, University of California -- Los Angeles
##                     David R. Hunter, Penn State University
##                     Martina Morris, University of Washington
##                     Skye Bender-deMoll, University of Washington
##  For citation information, type citation("network").
##  Type help("network-package") to get started.
## 
## Loading required package: networkDynamic
## 
## networkDynamic: version 0.9, created on 2015-11-24
## Copyright (c) 2015, Carter T. Butts, University of California -- Irvine
##                     Ayn Leslie-Cook, University of Washington
##                     Pavel N. Krivitsky, University of Wollongong
##                     Skye Bender-deMoll, University of Washington
##                     with contributions from
##                     Zack Almquist, University of California -- Irvine
##                     David R. Hunter, Penn State University
##                     Li Wang
##                     Kirk Li, University of Washington
##                     Steven M. Goodreau, University of Washington
##                     Jeffrey Horner
##                     Martina Morris, University of Washington
## Based on "statnet" project software (statnet.org).
## For license and citation information see statnet.org/attribution
## or type citation("networkDynamic").
## 
## Loading required package: ergm
## Loading required package: statnet.common
## 
## ergm: version 3.6-14181.1-2015.11.12-22.35.00, created on 2015-11-12
## Copyright (c) 2015, Mark S. Handcock, University of California -- Los Angeles
##                     David R. Hunter, Penn State University
##                     Carter T. Butts, University of California -- Irvine
##                     Steven M. Goodreau, University of Washington
##                     Pavel N. Krivitsky, University of Wollongong
##                     Martina Morris, University of Washington
##                     with contributions from
##                     Li Wang
##                     Kirk Li, University of Washington
##                     Skye Bender-deMoll, University of Washington
## Based on "statnet" project software (statnet.org).
## For license and citation information see statnet.org/attribution
## or type citation("ergm").
## 
## NOTE: If you use custom ERGM terms based on 'ergm.userterms'
## version prior to 3.1, you will need to perform a one-time update
## of the package boilerplate files (the files that you did not write
## or modify) from 'ergm.userterms' 3.1 or later. See
## help('eut-upgrade') for instructions.

function draft

Define a quick/crude implementation of the time projected graph with arbitrary discrete intervals/times. Does not implement the condensed/compressed version Jim suggests.

# create a crude version of discritized time projected graph from nD object

timeProjectedNetwork<-function(nd,start = NULL, end = NULL, time.increment = NULL, 
                     onsets = NULL, termini = NULL, ...){
  # TODO: initilize vertex pid to for case when of vertex dynamics change sizes of nets
  # slice up the dynamic network into a list of static networks
  nets<-get.networks(nd,start = start, end = end, time.increment = time.increment, 
                       onsets = onsets, termini = termini, ...)
  # init a new network of appropriate size
  projected<-network.initialize(length(nets)*network.size(nd))
  # TODO: map old vertex labels for debugging
  network.vertex.names(projected)<-unlist(network.vertex.names(nd),rep,network.size(nd))
  
  # loop over edges to create within-slice edges
  for(s in seq_along(nets)){
    el<-as.matrix.network.edgelist(nets[[s]])
    # multiply the vertex indices by s to give them for the next slice
    el<-el+(network.size(nd)*(s-1))
    add.edges(projected,tail=el[,1],head=el[,2])
    # handle un-directed edges by adding arc in each direction
    if(!is.directed(nd)){
      add.edges(projected,tail=el[,2],head=el[,1])
    }
  }
  # loop again to create between-slice vertex self ties
  # TODO: omit ties as indicated by vertex activity
  for(s in 1:(length(nets)-1)){
    tails<-1:network.size(nd)+network.size(nd)*(s-1)
    heads<-1:network.size(nd)+network.size(nd)*s
    add.edges(projected,tail=tails, head=heads ) 
  }
  return(projected)
}

This is pretty crude, doesn’t label stuff well, or tag the various edge types so they can be rendered differently.

Example time!

Moody’s 6 node example

Build the “tuning fork” network

m6<-network.initialize(6,directed=FALSE)
network.vertex.names(m6)<-LETTERS[1:6]
add.edges.active(m6,tail=1,head=2,onset=2,terminus=5)
add.edges.active(m6,tail=2,head=3,onset=3,terminus=7)
add.edges.active(m6,tail=3,head=4,onset=8,terminus=10)
add.edges.active(m6,tail=2,head=5,onset=0,terminus=1)
add.edges.active(m6,tail=5,head=6,onset=3,terminus=5)
plot(m6,displaylabels = TRUE,edge.label = get.edge.activity(m6))

Discretely re-sample it into a static network

m6projected<-timeProjectedNetwork(m6,start=0,end=10,time.increment=1)

plot with pre-defined grid coordinates for example

coords<-cbind(unlist(lapply(1:10,rep,6)),rep(1:6,10))
plot(m6projected,coord=coords,displaylabels=TRUE,jitter=FALSE)

The “Full multi-slice Representation” plot in Figure 2 of draft paper has a lot of implied vertex (in)-activity

plot with normal spatial layout algorithm

plot(m6projected,mode='kamadakawai',displaylabels = TRUE,vertex.cex=0.5)

Increase the temporal “resolution” by decreasing the bin size.

m6projectedFine<-timeProjectedNetwork(m6,start=0,end=10,time.increment=0.1)
coords<-cbind(unlist(lapply(1:100,rep,6)),rep(1:6,100))
plot(m6projectedFine,coord=coords,jitter=FALSE,
     vertex.cex=0.2, usearrows=FALSE)

Moody Contact Sim example

Try it out on the contact simulation example with varying times

First, lets do it with time steps of 100

data(moodyContactSim)
contactProjected100<-timeProjectedNetwork(moodyContactSim,start=0,end=1000,time.increment=100)
plot(contactProjected100,mode='kamadakawai',vertex.cex=0.5)

As Jim suggest, can also do it (more efficiently?) by only taking a network slices (bins) at the times that edges in the network actually change.

chChaChanges<-get.change.times(moodyContactSim)
contactProjectedChanges<-timeProjectedNetwork(moodyContactSim,onsets=chChaChanges,termini=chChaChanges)
plot(contactProjectedChanges,mode='kamadakawai',vertex.cex=0.5)

Newcomb example

This example starts as network list, so have to make into an nD object first, which is kind of silly

data(newcomb,fig.width=8)
## Warning in data(newcomb, fig.width = 8): data set '8' not found
newcombNd<-networkDynamic(network.list=newcomb)
## Neither start or onsets specified, assuming start=0
## Onsets and termini not specified, assuming each network in network.list should have a discrete spell of length 1
## Argument base.net not specified, using first element of network.list instead
## Created net.obs.period to describe network
##  Network observation period info:
##   Number of observation spells: 1 
##   Maximal time range observed: 0 until 14 
##   Temporal mode: discrete 
##   Time unit: step 
##   Suggested time increment: 1
newcombProjected<-timeProjectedNetwork(newcombNd)
plot(newcombProjected,mode='kamadakawai',vertex.cex=0.5)

This is pretty well connected, so the individual time slices hold together well. Problem is that I think this is the raw rank data for Newcomb, so everyone always has the the same degree, and you can’t see the structure. Need to threshold first.

Fun with 3D!!

Use carter’s rgl 3d implementation to reproduce Moody’s 3d layouts (have to actually run this to see it in R’s 3D GL viewer!)

3D ‘spatial’ layout

library(sna)
## sna: Tools for Social Network Analysis
## Version 2.3-2 created on 2014-01-13.
## copyright (c) 2005, Carter T. Butts, University of California-Irvine
##  For citation information, type citation("sna").
##  Type help(package="sna") to get started.
## 
## 
## Attaching package: 'sna'
## 
## The following object is masked from 'package:network':
## 
##     %c%
library(rglwidget)
library(rgl)
open3d(antialias=1,windowRect=c(0,0,500,500))
## glX 
##   1
gplot3d(contactProjectedChanges,edge.col='#55555555')
rglscene<-scene3d()
rgl.close()
rglwidget(rglscene)

(If RGL installed, will bring up a new 3d plot window. zoom in with mouse wheel, click and drag to rotate)

2D ‘spatial’ layout, plus 1D time projection

Should be pretty easy to use ndtv to generate smooth per-slice networks for force time oriented layout with each step as a layer.

First, load up data and construct the projected network

library(ndtv)
## Loading required package: animation
## 
## ndtv: version 0.8, created on 2015-10-14
## Copyright (c) 2015, Skye Bender-deMoll, University of Washington
##                     with contributions from
##                     Martina Morris, University of Washington
## Based on "statnet" project software (statnet.org).
## For license and citation information see statnet.org/attribution
## or type citation("ndtv").
data(short.stergm.sim)
shortStergProj<-timeProjectedNetwork(short.stergm.sim)
plot(shortStergProj,mode='kamadakawai')

Next, do a 2d layout on the original (not time-projected) network as if we were going to make a movie using the same bin intervals. Extract those coordinates, assemble into an x,y,z matrix, and pass it into the gplot3d command.

# compute coords for each slice
compute.animation(short.stergm.sim)
## No slice.par found, using
## slice parameters:
##   start:0
##   end:25
##   interval:1
##   aggregate.dur:1
##   rule:latest
## Calculating layout for network slice from time  0 to 1
## Calculating layout for network slice from time  1 to 2
## Calculating layout for network slice from time  2 to 3
## Calculating layout for network slice from time  3 to 4
## Calculating layout for network slice from time  4 to 5
## Calculating layout for network slice from time  5 to 6
## Calculating layout for network slice from time  6 to 7
## Calculating layout for network slice from time  7 to 8
## Calculating layout for network slice from time  8 to 9
## Calculating layout for network slice from time  9 to 10
## Calculating layout for network slice from time  10 to 11
## Calculating layout for network slice from time  11 to 12
## Calculating layout for network slice from time  12 to 13
## Calculating layout for network slice from time  13 to 14
## Calculating layout for network slice from time  14 to 15
## Calculating layout for network slice from time  15 to 16
## Calculating layout for network slice from time  16 to 17
## Calculating layout for network slice from time  17 to 18
## Calculating layout for network slice from time  18 to 19
## Calculating layout for network slice from time  19 to 20
## Calculating layout for network slice from time  20 to 21
## Calculating layout for network slice from time  21 to 22
## Calculating layout for network slice from time  22 to 23
## Calculating layout for network slice from time  23 to 24
## Calculating layout for network slice from time  24 to 25
## Calculating layout for network slice from time  25 to 26
# extract the coords it generated for each slice
xy<-lapply(0:24,function(t){
  cbind(x=get.vertex.attribute.active(short.stergm.sim,'animation.x',at=t),
        y=get.vertex.attribute.active(short.stergm.sim,'animation.y',at=t))
  })
# smoosh into 3D coord array
xy<-do.call(rbind,xy)
xyz<-cbind(xy,z=unlist(lapply(0:24,rep,network.size(short.stergm.sim))))
head(xyz)
##               x         y z
## [1,] -0.7350049 -2.397288 0
## [2,]  0.7517537 -2.682371 0
## [3,]  1.0661296  1.070898 0
## [4,] -0.2447175  2.636865 0
## [5,]  0.6671756  2.002633 0
## [6,]  1.9857179  1.201906 0
# make it 3d
gplot3d(shortStergProj,coord=xyz,jitter=FALSE)

So it kind of works, but definitely needs to be able to visually distinguish the within-slice and between slice edges. Not very easy to trace out paths of individual vertices, but can see groups, as long as interact enough to keep the 3d plot moving so my brain can percive the 3rd dimension.

1D ‘spatial’ layout, plus 1D time projection

Contrast with proximity timeline view of the same network. This does a 1D spatial layout at each timestep on the y-axis, and temporal ordering along the x-axis.

proximity.timeline(short.stergm.sim,mode='sammon',default.dist=10,labels.at=17)
## collapsing slice networks ...
## computing vertex positions using 1D sammon layout ...
##   computing positions for slice 1
##   computing positions for slice 2
##   computing positions for slice 3
##   computing positions for slice 4
##   computing positions for slice 5
##   computing positions for slice 6
##   computing positions for slice 7
##   computing positions for slice 8
##   computing positions for slice 9
##   computing positions for slice 10
##   computing positions for slice 11
##   computing positions for slice 12
##   computing positions for slice 13
##   computing positions for slice 14
##   computing positions for slice 15
##   computing positions for slice 16
##   computing positions for slice 17
##   computing positions for slice 18
##   computing positions for slice 19
##   computing positions for slice 20
##   computing positions for slice 21
##   computing positions for slice 22
##   computing positions for slice 23
##   computing positions for slice 24
##   computing positions for slice 25
## rendering splines for each vertex ...

I think this may be a bit easier to trace out individual vertex paths and see the integration of isolates and formation / seperation of groups. But the actuall times when vertices are tied are not directly plotted (only implied via proximity of lines)

In general, I find that all of these plotting techniques are not very generalizeable across classes of networks. A technique may work beautifully on one example, but fail misrably on second because some un-realized constraint is violated (i.e. planarity, maximum density, conectedness, etc.)

Does the descrete time path solution aproach exact as time interval decreases?

Enough pretty pictures …

Hypothesis: if we evaluate at every change step, geodesic distances on the time-projected graph will match earliest times in dynamic graph. If we use discrete bin intervals they will be different. But as bin intervals get smaller, it will approach the exact solution.

Compute the shortest path geodesics from the vertices in the first ‘wave’ to the vertices in the last ‘wave’. In order to make sure the paths are exactly comparable, we use the version where we generated a new ‘wave’ at each of the 35 points where anything changed in the network.

# this is the overall matrix
geodistMoodyContactChanges<-geodist(contactProjectedChanges)$gdist
# extract the distances between first and last wave
gdistChanges<-geodistMoodyContactChanges[1:16,545:560]
gdistChanges
##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
##  [1,]   34   36  Inf   36  Inf   36  Inf   36   35   Inf    35    35   Inf
##  [2,]   36   34   36   36   36   36   36   36   37   Inf    37    37    35
##  [3,]  Inf  Inf   34  Inf   36  Inf   36   36  Inf   Inf    37   Inf    35
##  [4,]   38   36   36   34   36   38   36   36   39    35    37    39    35
##  [5,]  Inf  Inf  Inf  Inf   34  Inf   36   36  Inf   Inf    37   Inf    35
##  [6,]  Inf  Inf  Inf   36  Inf   34  Inf  Inf  Inf   Inf   Inf   Inf   Inf
##  [7,]  Inf  Inf  Inf  Inf  Inf  Inf   34  Inf  Inf    35   Inf   Inf    35
##  [8,]  Inf  Inf  Inf  Inf   36  Inf   36   34  Inf   Inf    35   Inf    35
##  [9,]   35  Inf  Inf  Inf  Inf  Inf  Inf  Inf   34   Inf   Inf   Inf   Inf
## [10,]   39   37   37   35   37   39   35   37   40    34    38    40    36
## [11,]   35   37  Inf   37  Inf   37  Inf   35   36   Inf    34    36   Inf
## [12,]   35  Inf  Inf  Inf  Inf  Inf  Inf  Inf   36   Inf   Inf    34   Inf
## [13,]   37   35   35   35   35   37   35   35   38   Inf    36    38    34
## [14,]  Inf  Inf  Inf   35  Inf  Inf  Inf  Inf  Inf   Inf   Inf   Inf   Inf
## [15,]  Inf   35  Inf  Inf  Inf  Inf  Inf  Inf  Inf   Inf   Inf   Inf   Inf
## [16,]   35   35  Inf   35  Inf   35  Inf  Inf   36   Inf   Inf    36   Inf
##       [,14] [,15] [,16]
##  [1,]   Inf    37    35
##  [2,]   Inf    35    35
##  [3,]   Inf   Inf   Inf
##  [4,]    35    37    35
##  [5,]   Inf   Inf   Inf
##  [6,]   Inf   Inf    35
##  [7,]   Inf   Inf   Inf
##  [8,]   Inf   Inf   Inf
##  [9,]   Inf   Inf   Inf
## [10,]    36    38    36
## [11,]   Inf    38    36
## [12,]   Inf   Inf   Inf
## [13,]    36    36    36
## [14,]    34   Inf    36
## [15,]   Inf    34   Inf
## [16,]   Inf    36    34

(TODO: It is actually pretty wasteful to calculate the All Pairs Shortest Path on the projected network, since we actually only care about the distances from the first wave to the last, and it is evaluating all the in between ones as well. Be nice if I could tell the sna geodist function to only calculate paths between vertices 1:16,545:560 since we will subset to this anyway. )

Also compute the temporal distance of earliest arriving temporal path. Evaluate the tPath distances from each vertex at t=0 and aggregate into a matrix.

tdist<-t(sapply(1:16,function(v){
  tPath(moodyContactSim,v=v)$tdist
}))
tdist
##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
##  [1,]    0  543  Inf  708  Inf  661  Inf  709  634   Inf   174   581   Inf
##  [2,]  543    0  594  708  672  661  729  679  634   Inf   709   581   454
##  [3,]  Inf  Inf    0  Inf  672  Inf  729  679  Inf   Inf   709   Inf   594
##  [4,]  543  454  594    0  672  661  184  679  634     0   709   581   413
##  [5,]  Inf  Inf  Inf  Inf    0  Inf  729  679  Inf   Inf   709   Inf   672
##  [6,]  Inf  Inf  Inf  708  Inf    0  Inf  Inf  Inf   Inf   Inf   Inf   Inf
##  [7,]  Inf  Inf  Inf  Inf  Inf  Inf    0  Inf  Inf   184   Inf   Inf   729
##  [8,]  Inf  Inf  Inf  Inf  679  Inf  729    0  Inf   Inf   709   Inf   679
##  [9,]  634  Inf  Inf  Inf  Inf  Inf  Inf  Inf    0   Inf   Inf   Inf   Inf
## [10,]  543  454  594    0  672  661  184  679  634     0   709   581   413
## [11,]  174  543  Inf  708  Inf  661  Inf  709  634   Inf     0   581   Inf
## [12,]  581  Inf  Inf  Inf  Inf  Inf  Inf  Inf  634   Inf   Inf     0   Inf
## [13,]  543  454  594  413  672  661  729  679  634   Inf   709   581     0
## [14,]  Inf  Inf  Inf  625  Inf  Inf  Inf  Inf  Inf   Inf   Inf   Inf   Inf
## [15,]  Inf  669  Inf  Inf  Inf  Inf  Inf  Inf  Inf   Inf   Inf   Inf   Inf
## [16,]  543  535  Inf  708  Inf  661  Inf  Inf  634   Inf   Inf   581   Inf
##       [,14] [,15] [,16]
##  [1,]   Inf   669   543
##  [2,]   Inf   669   535
##  [3,]   Inf   Inf   Inf
##  [4,]   625   669   535
##  [5,]   Inf   Inf   Inf
##  [6,]   Inf   Inf   661
##  [7,]   Inf   Inf   Inf
##  [8,]   Inf   Inf   Inf
##  [9,]   Inf   Inf   Inf
## [10,]   625   669   535
## [11,]   Inf   669   543
## [12,]   Inf   Inf   Inf
## [13,]   625   669   535
## [14,]     0   Inf   708
## [15,]   Inf     0   Inf
## [16,]   Inf   669     0

First level of check: Do they find the same set of reachable vertices? Check that all the Inf match up between the matrices?

all(is.infinite(gdistChanges)==is.infinite(tdist))
## [1] TRUE

Great! However, it is a bit hard to directly compare distances with these versions, because the interval of time between waves in the ‘changes’ version are not the same at each time step. Also the paths selected to get there may not be the same..

Plot all the the distances/times for each cell in the matrix against each other

plot(as.numeric(gdistChanges),as.numeric(tdist))

As expected, they don’t line up exactly. Note also that the x-axis begins at 34, because the shortest path distance from a vertex at t=0 to itself at t=1000 is 34 identity-arc hops. We can subtract 34 from the distancs to make them eaiser to compare.

What about if we compare the number of steps in the lengths of the paths they find

tdistGHops<-t(sapply(1:16,function(v){
  tPath(moodyContactSim,v=v)$gsteps
}))


plot(as.numeric(gdistChanges)-34,as.numeric(tdistGHops),main='tdist hops vs changes hops')

Which paths are different?

tdistGHops==gdistChanges-34
##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
##  [1,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE  TRUE  TRUE  TRUE  TRUE
##  [2,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE  TRUE  TRUE  TRUE  TRUE
##  [3,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE  TRUE  TRUE  TRUE  TRUE
##  [4,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE  TRUE  TRUE  TRUE  TRUE
##  [5,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE  TRUE  TRUE  TRUE  TRUE
##  [6,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE  TRUE  TRUE  TRUE  TRUE
##  [7,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE  TRUE  TRUE  TRUE  TRUE
##  [8,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE  TRUE  TRUE  TRUE  TRUE
##  [9,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE  TRUE  TRUE  TRUE  TRUE
## [10,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE  TRUE  TRUE  TRUE  TRUE
## [11,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE  TRUE  TRUE  TRUE  TRUE
## [12,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE  TRUE  TRUE  TRUE  TRUE
## [13,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE  TRUE  TRUE  TRUE  TRUE
## [14,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE  TRUE  TRUE  TRUE  TRUE
## [15,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE  TRUE  TRUE  TRUE  TRUE
## [16,] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE  TRUE  TRUE  TRUE  TRUE
##       [,14] [,15] [,16]
##  [1,]  TRUE  TRUE  TRUE
##  [2,]  TRUE  TRUE  TRUE
##  [3,]  TRUE  TRUE  TRUE
##  [4,]  TRUE  TRUE FALSE
##  [5,]  TRUE  TRUE  TRUE
##  [6,]  TRUE  TRUE  TRUE
##  [7,]  TRUE  TRUE  TRUE
##  [8,]  TRUE  TRUE  TRUE
##  [9,]  TRUE  TRUE  TRUE
## [10,]  TRUE  TRUE FALSE
## [11,]  TRUE  TRUE  TRUE
## [12,]  TRUE  TRUE  TRUE
## [13,]  TRUE  TRUE  TRUE
## [14,]  TRUE  TRUE  TRUE
## [15,]  TRUE  TRUE  TRUE
## [16,]  TRUE  TRUE  TRUE

Looks like 4 to 16, and 10 to 16

tdistGHops[4,16]
## [1] 3
gdistChanges[4,16]-34
## [1] 1
tdistGHops[10,16]
## [1] 4
gdistChanges[10,16]-34
## [1] 2

Plot our favorite example network to see what is going on

plot(moodyContactSim,
     displaylabels=TRUE,
     label.cex=0.8,
     label.pos=5,
     vertex.col='white',
     vertex.cex=3,
     edge.label=sapply(get.edge.activity(moodyContactSim),function(e){
       paste('(',e[,1],'-',e[,2],')',sep='')
     }),
     edge.label.col='blue',
     edge.label.cex=0.7,
     main='moodyContactSim network'
   )

Both of theses cases are where the shortest temporal path does not equal the earliest temporal path. The path with the fewest number of graph hops arrives later. For example, 4-16 is the shortest path (arriving at t=748), but the earliest is 4-13-2-16 (arriving at t=575).

Construct a version where we aggregate the network in 10 waves of 100 time units. Pretty sure that this is going to cause some distortion, so this will be the straw-man example.

# this is the overall matrix
geodistMoodyContact100<-geodist(contactProjected100)$gdist
# extract the distances between first and last wave
gdist100<-geodistMoodyContact100[1:16,145:160]

And a version where we aggregate in 20 waves of 50 time units

# this is the overall matrix
contactProjected50<-timeProjectedNetwork(moodyContactSim,time.increment=50)
geodistMoodyContact50<-geodist(contactProjected50)$gdist
# extract the distances between first and last wave
gdist50<-geodistMoodyContact50[1:16,305:320]
plot(as.numeric(gdist100),as.numeric(gdist50),main='100step aggregation vs 50step')

In this version, the times align (they differ by constant multiple because the we have not weighted the identity edges in any way to account for the differences in temporal length)

Lets do it so we construct a time projected network with a 1000 waves, one for every single discrete time point (this will take a while)

# this is the overall matrix
contactProjected1<-timeProjectedNetwork(moodyContactSim,time.increment=1) # <-- this object is 13Mb
geodistMoodyContact1<-geodist(contactProjected1)$gdist

Can’t run version above because it gives “Error: cannot allocate vector of size 1.9Gb” when it tries to allocate the matrix for APSP geodist. Since we can’t make matrices that big, lets try every 5 time steps as an approximation.

# this is the overall matrix
contactProjected5<-timeProjectedNetwork(moodyContactSim,time.increment=5) # <-- this object is 2.6 Mb
geodistMoodyContact5<-geodist(contactProjected5)$gdist
# extract the distances between first and last wave
gdist5<-geodistMoodyContact5[1:16,3185:3200]

compare time increment 5 to 100

plot(as.numeric(gdist100),as.numeric(gdist5),main='100step aggregation vs 5step')

compare 5 to ‘changes’ version

plot(as.numeric(gdist5),as.numeric(gdistChanges),main='5step aggregation vs exact changes')

The 5-time unit aggregation matches the “exact” (changes) version.

plot(as.numeric(gdist100),as.numeric(gdistChanges), main='100step aggregation vs exact changes')

But, not surprisingly, the 100 time unit aggregation, seems to under-estimate some distances, presumably because the aggregation makes some paths seem feasible that are not. This makes sense, as the duration of most edges is ~30 in this simple example.

The 100 unit aggregation doesn’t just get the paths lengths wrong, it also over-estimates the forward reachable set.

sum(!is.infinite(gdist100)) # how many non-Inf distances are there?
## [1] 147
sum(!is.infinite(gdist5))  # how many non-Inf distances are there?
## [1] 121

So,

  • The aggregation bin size we choose can impact lengths of paths and rechable set found
  • the reachable sets found by the geodesic distances BFS on the time projected network method and the earliest temporal distances DFS seem to give the same reachable sets (provided we use a small-enough bin size, or exact (‘changes’) bins bins for time projected method)
  • The paths found may differ between the methods

Thoughts:

As always when slicing up a continuous network into discrete samples, we have the choice between sampling ‘at’ a time point or aggregating over the entire window between samples (the default in the examples above, with the exception of changes). The former will nearly always underestimate ‘true’ connectivity because it can miss ties that occur between sampling periods, and the later will overestimate (as Jim mentions) because it will collapse the relative ordering of ties occurring within the bin, making them appear concurrent when they may not be.

Making the network discrete is generally going to bias the temporal information one way or another. So the time projected is probably more useful if you have data that is only collected in a few survey waves. However, if the focus is transmission phenomena, and we collect data in waves (w/o asking for explicit timing on ties within waves), and the survey times are far enough apart that there is more than one tie change per wave, we have already lost some of the ordering information needed for transmission anyway and maybe shouldn’t be using these metrics.

I think that the time projected network is extremely useful for pedagogical purpose and thinking through various measures.

But I’d also argue that as we increase the accuracy by letting the time interval between slices approach the limit of zero, and the redundent information is removed via compression, the representation is becoming closer and closer to the continuous time spell-based representation used in networkDynamic. And in general the spell representation gives an efficient way to compute things since we can build algorithms that can use the spell information to operate as if we have constructed the time projected network without allocating giant matrices.